2 research outputs found

    Multimodal representation learning with neural networks

    Get PDF
    Abstract: Representation learning methods have received a lot of attention by researchers and practitioners because of their successful application to complex problems in areas such as computer vision, speech recognition and text processing [1]. Many of these promising results are due to the development of methods to automatically learn the representation of complex objects directly from large amounts of sample data [2]. These efforts have concentrated on data involving one type of information (images, text, speech, etc.), despite data being naturally multimodal. Multimodality refers to the fact that the same real-world concept can be described by different views or data types. Addressing multimodal automatic analysis faces three main challenges: feature learning and extraction, modeling of relationships between data modalities and scalability to large multimodal collections [3, 4]. This research considers the problem of leveraging multiple sources of information or data modalities in neural networks. It defines a novel model called gated multimodal unit (GMU), designed as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. The GMU can be used as a building block for different kinds of neural networks and can be seen as a form of intermediate fusion. The model was evaluated on four supervised learning tasks in conjunction with fully-connected and convolutional neural networks. We compare the GMU with other early and late fusion methods, outperforming classification scores in the evaluated datasets. Strategies to understand how the model gives importance to each input were also explored. By measuring correlation between gate activations and predictions, we were able to associate modalities with classes. It was found that some classes were more correlated with some particular modality. Interesting findings in genre prediction show, for instance, that the model associates the visual information with animation movies while textual information is more associated with drama or romance movies. During the development of this project, three new benchmark datasets were built and publicly released. The BCDR-F03 dataset which contains 736 mammography images and serves as benchmark for mass lesion classification. The MM-IMDb dataset containing around 27000 movie plots, poster along with 50 metadata annotations and that motivates new research in multimodal analysis. And the Goodreads dataset, a collection of 1000 books that encourages the research on success prediction based on the book content. This research also facilitates reproducibility of the present work by releasing source code implementation of the proposed methods.Doctorad

    Representation learning for histopathology image analysis

    Get PDF
    Abstract. Nowadays, automatic methods for image representation and analysis have been successfully applied in several medical imaging problems leading to the emergence of novel research areas like digital pathology and bioimage informatics. The main challenge of these methods is to deal with the high visual variability of biological structures present in the images, which increases the semantic gap between their visual appearance and their high level meaning. Particularly, the visual variability in histopathology images is also related to the noise added by acquisition stages such as magnification, sectioning and staining, among others. Many efforts have focused on the careful selection of the image representations to capture such variability. This approach requires expert knowledge as well as hand-engineered design to build good feature detectors that represent the relevant visual information. Current approaches in classical computer vision tasks have replaced such design by the inclusion of the image representation as a new learning stage called representation learning. This paradigm has outperformed the state-of-the-art results in many pattern recognition tasks like speech recognition, object detection, and image scene classification. The aim of this research was to explore and define a learning-based histopathology image representation strategy with interpretative capabilities. The main contribution was a novel approach to learn the image representation for cancer detection. The proposed approach learns the representation directly from a Basal-cell carcinoma image collection in an unsupervised way and was extended to extract more complex features from low-level representations. Additionally, this research proposed the digital staining module, a complementary interpretability stage to support diagnosis through a visual identification of discriminant and semantic features. Experimental results showed a performance of 92% in F-Score, improving the state-of-the-art representation by 7%. This research concluded that representation learning improves the feature detectors generalization as well as the performance for the basal cell carcinoma detection task. As additional contributions, a bag of features image representation was extended and evaluated for Alzheimer detection, obtaining 95% in terms of equal error classification rate. Also, a novel perspective to learn morphometric measures in cervical cells based on bag of features was presented and evaluated obtaining promising results to predict nuclei and cytoplasm areas.Los m茅todos autom谩ticos para la representaci贸n y an谩lisis de im谩genes se han aplicado con 茅xito en varios problemas de imagen m茅dica que conducen a la aparici贸n de nuevas 谩reas de investigaci贸n como la patolog铆a digital. El principal desaf铆o de estos m茅todos es hacer frente a la alta variabilidad visual de las estructuras biol贸gicas presentes en las im谩genes, lo que aumenta el vac铆o sem谩ntico entre su apariencia visual y su significado de alto nivel. Particularmente, la variabilidad visual en im谩genes de histopatolog铆a tambi茅n est谩 relacionada con el ruido a帽adido por etapas de adquisici贸n tales como magnificaci贸n, corte y tinci贸n entre otros. Muchos esfuerzos se han centrado en la selecci贸n de la representacion de las im谩genes para capturar dicha variabilidad. Este enfoque requiere el conocimiento de expertos y el dise帽o de ingenier铆a para construir buenos detectores de caracter铆sticas que representen la informaci贸n visual relevante. Los enfoques actuales en tareas de visi贸n por computador han reemplazado ese dise帽o por la inclusi贸n de la representaci贸n en la etapa de aprendizaje. Este paradigma ha superado los resultados del estado del arte en muchas de las tareas de reconocimiento de patrones tales como el reconocimiento de voz, la detecci贸n de objetos y la clasificaci贸n de im谩genes. El objetivo de esta investigaci贸n es explorar y definir una estrategia basada en el aprendizaje de la representaci贸n para im谩genes histopatol贸gicas con capacidades interpretativas. La contribuci贸n principal de este trabajo es un enfoque novedoso para aprender la representaci贸n de la imagen para la detecci贸n de c谩ncer. El enfoque propuesto aprende la representaci贸n directamente de una colecci贸n de im谩genes de carcinoma basocelular en forma no supervisada que permite extraer caracter铆sticas m谩s complejas a partir de las representaciones de bajo nivel. Tambi茅n se propone el m贸dulo de tinci贸n digital, una nueva etapa de interpretabilidad para apoyar el diagn贸stico a trav茅s de una identificaci贸n visual de las funciones discriminantes y sem谩nticas. Los resultados experimentales mostraron un rendimiento del 92% en t茅rminos de F-Score, mejorando la representaci贸n del estado del arte en un 7%. Esta investigaci贸n concluye que el aprendizaje de la representaci贸n mejora la generalizaci贸n de los detectores de caracter铆sticas as铆 como el desempe帽o en la detecci贸n de carcinoma basocelular. Como contribuciones adicionales, una representaci贸n de bolsa de caracteristicas (BdC) fue ampliado y evaluado para la detecci贸n de la enfermedad de Alzheimer, obteniendo un 95% en t茅rminos de EER. Adem谩s, una nueva perspectiva para aprender medidas morfom茅tricas en las c茅lulas del cuello uterino basado en BdC fue presentada y evaluada obteniendo resultados prometedores para predecir las are谩s del nucleo y el citoplasma.Maestr铆
    corecore